A Survey of Distance Metrics for Nominal Attributes

نویسندگان

  • Chaoqun Li
  • Hongwei Li
چکیده

Many distance-related algorithms, such as knearest neighbor learning algorithms, locally weighted learning algorithms etc, depend upon a good distance metric to be successful. In this kind of algorithms, a key problem is how to measure the distance between each pair of instances. In this paper, we provide a survey on distance metrics for nominal attributes, including some basic distance metrics and their improvements based on attribute weighting and attribute selection. The experimental results on the whole 36 UCI datasets published on the main web site of Weka platform validate their effectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Heterogeneous Distance Functions

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This pa...

متن کامل

Probabilistic Distance Measures for Prototype-based Rules

Probabilistic distance functions, including several variants of value difference metrics, minimum risk metric and ShortFukunaga metrics, are used with prototype-based rules (P-rules) to provide a very concise and comprehensible classification model. Application of probabilistic metrics to nominal or discrete features is straightforward. Heterogeneous metrics that handle continuous attributes wi...

متن کامل

Dissimilarity learning for nominal data

Defining a good distance (dissimilarity) measure between patterns is of crucial importance in many classification and clustering algorithms. While a lot of work has been performed on continuous attributes, nominal attributes are more difficult to handle. A popular approach is to use the value difference metric (VDM) to define a real-valued distance measure on nominal values. However, VDM treats...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Interval MULTIMOORA method with target values of attributes based on interval distance and preference degree: biomaterials selection

A target-based MADM method covers beneficial and non-beneficial attributes besides target values for some attributes. Such techniques are considered as the comprehensive forms of MADM approaches. Target-based MADM methods can also be used in traditional decision-making problems in which beneficial and non-beneficial attributes only exist. In many practical selection problems, some attributes ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JSW

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010